Skip to content

WEB: Obfuscating workgroup email addresses to fix Issue #51209 #51266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 24, 2023

Conversation

Kabiirk
Copy link
Contributor

@Kabiirk Kabiirk commented Feb 9, 2023

Hi,

As defined by the scope of this Issue #51209 , I have implemented a JavaScript based solution for obfuscating workgroup emails:

If required, please let me know whether any Tests/Code Checks need to be added/passed for this or if any other change needs to be implemented to close this PR successfully .

Thanks


@mroeschke mroeschke added the Web pandas website label Feb 9, 2023
@Kabiirk
Copy link
Contributor Author

Kabiirk commented Feb 10, 2023

Hi @mroeschke & @datapythonista ,

Going through the details in Code Checks / pre-commit (pull_request), it seems there is a trailing whitespace in my code that is causing the issue. There is an extra whitespace at line 59 in team.md (Please ref. screenshot below). I am making the change and commiting again in the same PR.

Ref.
image

@Kabiirk
Copy link
Contributor Author

Kabiirk commented Feb 10, 2023

@github-actions pre-commit

@Kabiirk
Copy link
Contributor Author

Kabiirk commented Feb 11, 2023

Okay,

So the Code Checks / pre-commit (pull_request) is passing now.
But I am not sure why Ubuntu / Data Manager (pull_request) action is now failing for this PR. I have gone through the action details/summary and and it seems there is an AssertionError in this function/code-block of pandas/util/_test_decorators.py :
image

But as per my understanding, this should run if the platform is Windows (line 250), so why is it being executed in an Ubuntu based check ? Maybe there is some gap in my understanding, please clarify & let me know the way forward.

Thanks

@datapythonista
Copy link
Member

Preview docs at: https://pandas.pydata.org/preview/51266/

@datapythonista
Copy link
Member

Thanks @Kabiirk, this seems to be working as expected: https://pandas.pydata.org/preview/51266/about/team.html and the CI failures seem unrelated to this PR.

I'd personally avoid having the real addresses in the html as said in the issue. I think this is an improvement, but with the approach I proposed we could avoid scrappers using a regex for the email addresses, not only the ones that extract it from the a tag analyzing the dom.

@Kabiirk
Copy link
Contributor Author

Kabiirk commented Feb 14, 2023

@datapythonista Per my understanding, you wanted to prepend text to the email initially, then have a JS script change that text.

Not sure how the initial prepending of text would work though, since prepending & removing text with the same JS script would have no effect because the original email would still populate the DOM first, then be worked upon (prepend + removal). For this let me see if I can do anything with Event Listeners or defer/onLoad property in JS to make sure template loads the wrong email first, then JS script runs after that and changes it.

Alternatively, emails from the source itself would need to have a prefix, which the JS script removes (I don't think that's a good Idea).

I'll make changes in this PR itself.

@datapythonista
Copy link
Member

Alternatively, emails from the source itself would need to have a prefix, which the JS script removes (I don't think that's a good Idea).

Not in the source (the config file), but in the template, literally in the same line where you remove the prefix again via javascript. It's not great to have to do this, but I think it's a much simpler approach than any alternative that prevents the email addresses to be in the html, while they are made available to users.

Thanks @Kabiirk

@Kabiirk
Copy link
Contributor Author

Kabiirk commented Feb 14, 2023

Hi @datapythonista,

I have implemented the proposed change. As expected, scraper picks up the wrong email (i.e. with the prefix):
image

While the correct email is visible to the end-user after the<script></script> tag executes:
image

Moreover, I was able to completely eliminate the use of workgroup.contact from the <script></script> tags. So even if someone takes a raw dump of the HTML response with requests.get(TEAM_URL).content.decode(), they will not get the email from within the<script></script>. At least not without manual effort.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Kabiirk, added couple of suggestions to simplify the code, but looks great.

@Kabiirk
Copy link
Contributor Author

Kabiirk commented Feb 16, 2023

I'll make the suggested changes in my local branch & commit again.

@Kabiirk
Copy link
Contributor Author

Kabiirk commented Feb 18, 2023

Hi,

I tested this and it works, have committed the same to PR. Let me know the way forward.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice @Kabiirk, looks good to me.

@Kabiirk
Copy link
Contributor Author

Kabiirk commented Feb 23, 2023

Hi @datapythonista ,

Following up on your approval, let me know if this PR is supposed to be merged by me, or it would be merged by you ?

@datapythonista
Copy link
Member

Sorry for the delay @Kabiirk. We usually want two core developers to review a PR before it gets merged, to be in the safe side. I think we're both done here, let's see if another core dev can do the second review soon, and when they do, they'll merge. Thanks again for the help with this!

@mroeschke mroeschke added this to the 2.1 milestone Feb 24, 2023
@mroeschke mroeschke merged commit e3c2a5e into pandas-dev:main Feb 24, 2023
@mroeschke
Copy link
Member

Thanks @Kabiirk

@Kabiirk
Copy link
Contributor Author

Kabiirk commented Feb 25, 2023

Thanks a lot @datapythonista & @mroeschke

No worries, thanks for clarifying that.
I understand it can be a challenge to comprehensively look at so many PRs & give in-depth inputs ! Really appreciate your help & suggestions on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Web pandas website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WEB: Obfuscate workgroup email addresses
3 participants